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,  ABSTRACT 

\\ 

— ^  The  United  States  Defense  Mapping  Agency  is  the  federal  organization 
charged  with  maintaining  foreign  names  information.  The  Foreign  Place 
Names  File  contains  references  to  more  than  3.5  million  approved  and  1.5 
million  variant  names  for  foreign  countries,  undersea  features,  and  extra¬ 
terrestrial  features.  The  file  is  currently  stored  on  index  cards,  because 
existing  computer  equipment  has  not  been  capable  of  displaying  and  process¬ 
ing  the  diacritics  and  special  symbols  characteristic  of  the  extended  Latin 
alphabet. 

In  a  joint  effort  with  the  US  Army  Engineer  Topographic  Laboratories 
and  the  Illinois  Institute  of  Technology  and  Research  Institute,  the  United 
States  Defense  Mapping  Agency  is  developing  a  prototype  Names  Input  Station 
to  digitally  process  foreign  names  data.  The  station  can  input,  output, 
edit,  and  display  diacritics  and  special  symbols  for  the  over  one  hundred 
rcmanized  non-English  languages  required  by  the  agency. 


INTRODUCTION 


The  United  States  Defense  Mapping  Agency  (DMA)  is  the  federal  agency 
in  the  United  States  responsible  for  maintaining  names  information  on  for¬ 
eign  places,  undersea  features,  and  extraterrestrial  features.  DMA  collects 
and  evaluates  names  data  and  works  with  the  United  States  Board  on  Geographic 
Names  (BGN)  to  "develop  policies,  principles,  and  procedures  governing  the 
use,  spelling,  and  application  of  geographic  names. In  addition,  DMA 
maintains  the  BGN  Foreign  Place  Names  File,  which  is  used  to  produce  gaz¬ 
etteers,  validate  information  for  hydrographic,  topographic,  and  aeronauti¬ 
cal  map  and  chart  products,  and  as  a  resource  for  responding  to  inquiries 
from  other  government  agencies  and  private  businesses  or  individuals. 

The  cornerstone  of  the  names  production  process,  the  Foreign  Place 
Names  File,  consists  of  names,  locations,  descriptions,  and  source  material 
history  for  geographic  features.  (See  Figure  I)  Stored  on  4”  X  6" 
index  file  cards,  this  massive  file  contains  more  than  3.5  million  BGN 
approved  names  and  over  1.5  million  recorded  variants.  This  present  man¬ 
ual  system  is  unable  to  keep  pace  with  ever  increasing  demands  for  new 
and  updated  names  information.  Initial  attempts  to  automate  the  system 
were  hindered,  because  existing  computer  equipment  could  not  adequately 
process  and  display  the  names  data. 

Geographic  names  information  at  DMA  is  unique  because  of  its  extended 
Latin  character  set.  Foreign  languages  based  on  the  Roman  alphabet  may  contain 
diacritics  and  special  symbols.  French,  for  example,  has  acute  (')  and 
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grave  (')  accents,  circumflexes  (^) ,  umlauts  (  ),  and  cedillas  (^)  for 
diacritics,  and  the  o  e  ligature  (ca)  and  apostrophe  (i)  for  special  symbols. 
Also,  non-Roman  alphabets  may  be  phonetically  converted  to  Roman  based  forms 
using  transliteration  schemes.  Thus,  in  Arabic,  an  becomes  an  a.  All  told, 
DMA  uses  one  hundred  and  sixteen  languages,  of  which  seventy-three  have  approved 
diacritics  or  special  symbols.  More  languages  use  diacritics  and  special 
symbols,  but  since  the  transliteration  schemes  are  not  approved,  they  are 
not  included  in  the  system.  Some  of  the  languages,  like  Chichewa,  spoken  in 
Malawi,  may  contain  a  single  diacritic,  but  most  have  more;  in  fact,  Vietnamese 
contains  fifty-four  diacritic  and  letter  combinations  and  thirteen  special 
symbols.  While  phototypesetting  has  long  permitted  the  printing  of  special 
symbols  and  diacritics  for  final  copy,  the  problem  for  the  names  specialist 
has  been  to  locally  display  the  special  symbols  or  characters  with  diacritics 
located  in  the  correct  position.  Consider  the  problems  in  printing  an  "a" 
with  an  acute  accent  (a).  The  earliest  names  information  was  printed  in 
uppercase  letters  with  no  diacritics,  or  (A).  Next,  lowercase  letters  could 
be  displayed,  but  there  were  still  no  diacritics  (a).  DMA  currently  has  a 
high  speed  printing  system  which  can  display  some  diacritics  adjacent  to  their 
associated  character  (a  *  ) .  Ultimately,  however,  the  goal  has  been  to  print 
the  diacritic  in  its  proper  position  relative  to  its  associated  character  (£) 
and  this  can  be  done  on  the  Names  Input  Station. 

The  United  States  Defense  Mapping  Agency  tasked  the  US  Army  Engineer 
Topographic  Laboratories  (USAETL)  to  design  and  build  a  prototype  Names 
Input  Station  for  automated  names  work.  Development  is  being  accomplished 
under  contract  with  the  Illinois  Institute  of  Technology  and  Research 
Institute  (IITRI)  and  is  scheduled  for  completion  in  early  1982. 
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NAMES  INPUT  STATION 


The  Names  Input  Station  (NIS)  was  developed  to  input,  output,  edit, 
and  display  names  data,  DMA  outlined  a  number  of  features  necessary  for 
handling  foreign  languages.  First,  the  keyboard  had  to  be  designed  to  allow 
an  operator  to  enter  a  diacritic  or  special  symbol  with  a  single  keystroke. 
Second,  the  station  had  to  display  the  special  symbols  and  diacritics  in 
the  proper  position  relative  to  its  associated  uppercase  or  lowercase  alpha 
character.  Third,  the  data  had  to  be  stored  in  ASCII  format,  so  the  infor¬ 
mation  could  be  processed  by  any  computer  using  this  format,  whether  or  not 
that  computer  could  display  the  special  symbols  or  diacritics.  Fourth,  the 
station  had  to  provide  local  hard  copy  of  the  data  displayed,  again  with 
correctly  positioned  diacritics. 

The  Names  Input  Station  developed  by  IITRI  includes  three  hardware 
items:  an  ECD  Smart  ASCII  intelligent  terminal,  a  Per  Sci  floppy  disk 
drive,  and  a  Florida  Data  printer.  The  ECD  Smart  ASCII  was  selected  for 
its  flexibility  and  met  all  but  the  hard  copy  requirements.  In  addition 
to  a  standard  terminal  keyboard,  the  Smart  ASCII  has  two  outboard  keypads 
which  give  it  a  total  of  one  hundred  and  thirty-four  keys.  The  additional 
keys  are  necessary  for  the  placement  of  the  special  symbols  and  diacritics. 

A  close-up  of  the  left  outboard  keypad  (See  Figure  2)  shows  examples  of 
the  diacritics  located  over  a  representative  character.  To  enter  a 
character  and  diacritic,  an  operator  would  first  strike  the  key  with  the 
character  on  the  central  standard  keyboard,  and  then  move  to  the  outboard 
keypad  and  strike  the  key  with  the  diacritic.  The  resulting  character  and 
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Figure  2 

Left  Out&oard  Keypad  of  Names  Input  Station 

The  diacritics  are  displayed  with  a  sample  character. 
For  example,  the  circumflex  (A)  over  the  e  could  also  be  placed 
over  an  a,  i,  o,  or  u  if  the  names  specialist  were  entering 
French  names  data. 


properly  positioned  diacritic  will  then  appear  on  the  cathode  ray  tube  (CRT) 
display  to  verify  the  input.  Hard  copies  of  the  screen  data  may  next  be 
printed  on  the  ECD  modified  Florida  Data  Model  BNY  matrix  printer. 

NAMES  INPUT  STATION  SOFTWARE 

The  Names  Input  Station  software  has  been  developed  by  the  Illinois 
Institute  of  Technology  Research  Institute  around  the  ECD  program  Translex, 
a  powerful  text  editing  and  word  processing  package.  IITRI  first  exploited  the 
user  definable  fonts  to  create  a  series  of  Regional  Diacritic  Sets  (REDS) 
containing  DMA's  required  special  symbols  and  diacritics.  IITRI  then  utilized 
the  Translex  macro-programming  facility  and  developed  software  routines  to 
meet  the  names  production  needs. 

The  creation  of  diacritics  and  special  symbols  is  central  to  the  Names 
Input  Station  software.  Each  individual  character  and  diacritic  is  defined 
by  an  8  X  16  matrix  of  cells.  (See  Figure  3)  These  are  edited  and  refined 
to  meet  the  cosmetic  requirements  of  DMA.  The  special  symbols  and  dia¬ 
critics  are  then  assigned  to  keys  on  the  outboard  keypad.  The  keyboard  may 
be  redefined  at  any  point  to  change  or  expand  the  system  to  accommodate 
revised  or  newly  approved  transliteration  schemes.  This  flexibility  is 
highly  desirable  in  a  research  system. 

The  diacritics  and  special  symbols  are  grouped  into  Regional  Diacritic 
Sets.  Each  REDS  contains  the  diacritics  and  special  symbols  for  a  number 
of  languages  within  geographically  related  countries.  There  are  fourteen 
REDS  encompassing  the  one  hundred  and  sixteen  languages  required  by  DMA. 

A  particular  language  may  be  found  in  more  than  one  REDS,  and  most  REDS 


Figure  3 

8  X  16  Matrix  for  Formation  of  a  Single  Character  with  Diacritic 

The  formation  of  an  uppercase  A  with  a  breve  O')  is  illustrated  above.  When  the 
letter  and  diacritic  are  printed  on  the  Florida  Data  printer,  spaces  between  the  cells  on 
the  matrix  are  filled  in  to  produce  a  smoother  image. 


contain  approximately  seven  languages.  Thus,  the  REDS  allow  access  to 
the  diacritics  and  special  symbols  of  the  languages  of  a  country,  as  well  as 
neighboring  countries.  The  only  exceptions  to  the  rule  occur  in  the  Soviet 
Union  and  Vietnam.  The  Soviet  Union  contains  fifteen  major  languages  and 
was  broken  down  into  three  separate  REDS.  In  Vietnam,  the  only  major 
language  was  Vietnamese,  but  it  contains  such  a  large  number  of  diacritics 
and  special  symbols  that  the  REDS  contains  only  a  single  language.  All 
REDS  are  accessed  by  striking  a  REDS  command  key,  typing  the  name  of  the 
country  or  language  to  be  accessed,  and  striking  the  REDS  command  key  once 
again.  This  loads  the  REDS  so  the  appropriate  characters  appear  on  the  CRT 
display  and  hard  copy  printouts.  (See  Figure  4) 

Creating  the  individual  characters  and  combining  them  into  Regional 
Dacritic  Sets  preceded  the  development  of  software  to  meet  DMA's  names 
production  needs.  IITRI  has  also  designed  and  implemented  routines  to 
augment  gazetteer  production,  compile  name  change  lists,  and  serve  as  a  trans¬ 
literation  scratchpad. 

DMA  produces  gazetteers  for  approximately  one  hundred  and  sixty-five 
countries,  and  each  gazetteer  contains  information  on  the  feature  name, 
feature  type  (stream,  mountain,  etc.),  latitude,  longitude,  UTM  coordinate 
location,  a  numeric  area  code,  and  the  Joint  Operations  Graphic  map  sheet 
number.  The  information  is  currently  in  both  analog  and  digital  form,  with 
the  digital  data  specially  coded  for  phototypesetting  operations  and  final 
gazetteer  production.  The  Names  Input  Station  will  augment  this  process  in 
two  ways.  First,  the  Florida  Data  printer  will  allow  the  names  specialists 
to  produce  interim  gazetteers  on-site  with  no  time  delay,  (See  Figure  5) 


WHOLE  FONT  DISPLAY 


The  North  European  REDS  contains  all  the  DMA  specified  special  symbols  and  diacritics  for  the 
languages  of  northern  Europe.  Languages  or  transliterated  forms  of  languages  covered  include:  Nor¬ 
wegian,  Swedish,  Danish,  Finnish,  Russian,  Estonian,  Latvian,  Polish,  German,  Lappish,  Lithuanian, 
Faroese,  and  Greenlandic. 


South  Vietnam,  Official  Standard  Names  Gazetteer,  hay  1971 
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An  Interim  gazetteer  does  not  have  the  printed 
quality  of  a  phototypeset  product,  but  can  be 
rapidly  produced  locally  by  a  names  analyst. 


Second,  che  printer  will  provide  local  hard  copy  for  data  verification  when 
the  gazetteer  information  is  to  be  transferred  to  the  phototypesetter. 
Supplemental  information  used  to  support  gazetteer  production  may  be  compiled 
using  the  name  change  list  or  transliteration  scratchpad  software. 

The  name  change  list  (See  Figure  6)  contains  three  components:  the 
old  name,  new  name,  and  coordinate  location.  This  information  was  formerly 
entered  with  an  IBM  Selectric  typewriter  with  the  diacritics  later  added  by 
hand  in  a  separate  operation.  With  the  use  of  the  Names  Input  Station,  this 
can  be  done  with  a  single  typing  step. 

The  transliteration  scratchpad  function  allows  a  names  specialist  to 
enter  and  store  text  in  any  of  the  available  languages.  As  an  example, 
a  linguist  might  wish  to  record  supplementary  information  on  a  particular 
transliteration  scheme.  (See  Figure  7)  The  data  could  be  stored  in  a  file 
on  disk  and  accessed  for  reference  purposes  by  other  linguists. 

FUTURE  ROLE  OF  THE  NAMES  INPUT  STATION  AT  DMA 

The  Names  Input  Station  fills  both  short  term  and  long  term  needs  of 
the  Scientific  Data  Department  at  DMA.  In  the  short  term,  the  station 
addresses  production  needs  by  tying  into  the  existing  work  flow.  In  its 
long  term  and  more  important  role,  the  Names  Input  Station  will  serve  as  a 
research  tool  for  the  development  of  specifications  for  future  automated 
names  production,  in  particular,  the  creation  of  a  digital  Foreign  Place 
Names  File.  The  Names  Input  Station  will  be  used  to  develop  requirements 
for  future  data  entry  systems,  evaluate  the  problem  of  standardization  of 
diacritics  and  special  symbols  in  an  automated  environment,  define  names 
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Name  change  lists  are  compiled  to  keep 
the  Foreign  Place  Names  File  up-to-date. 
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data  formats,  and  analyze  networking  with  other  government  and  non¬ 
government  computer  systems. 


Footnotes 

1.  "United  States  Board  on  Geographic  Names,"  pamphlet  BGN/4/1980,  p.  1. 
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